Dynamics of Competitive Evolution on a Smooth Landscape 
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We study competitive DNA sequence evolution directed by in vitro protein binding. The steady- 
state dynamics of this process is well described by a shape-preserving pulse which decelerates and 
eventually reaches equilibrium. We explain this dynamical behavior within a continuum mean- 
field framework. Analytical results obtained on the motion of the pulse agree with simulations. 
Furthermore, finite population correction to the mean- field results are found to be insignificant. 
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Competitive evolution such as breeding has been prac- 
ticed for ages. With recent advances in molecular biology, 
this method is widely used to develop novel proteins and 
DNA sequences for a variety of applications Q . The ba- 
sic idea of competitive molecular evolution is straightfor- 
ward: in each generation, a number of molecules with cer- 
tain desired characteristics are selected from the popula- 
tion; they are then diversified (via point mutation and/or 
recombination 0) and amplified back to the original pop- 
ulation size. The "speed" of evolution as well as the final 
equilibrium distribution depend on a variety of factors 
such as the mutation rate, selection strength, molecule 
length, and population size. A systematic quantitative 
understanding of these dependencies is lacking thus far. 
Such understanding is not only of theoretical interest, but 
also helpful in improving the efficiency of the breeding 
processes. In this study, we develop a theoretical model 
for the simplest type of competitive evolution involving 
only point mutations on a smooth landscape. We achieve 
an understanding of this model with concepts and tech- 
niques developed in the study of front propagation [j| . 

To make the discussion concrete, we focus on the in 
vitro evolution of DNA sequences due to competitive 
binding to proteins. An example of such a system is the 
recent experiment of Dubertret et. al. 3 , where DNA se- 
quences are selected competitively according to their rel- 
ative affinities for the /ac-repressor protein. In this exper- 
iment, selection is accomplished by coating a beaker with 
Zac-repressor molecules followed by subsequent washing, 
so that only the strongly-bound sequences remain. Muta- 
tion and amplification are then accomplished by multiple 
stages of polymerase chain reaction |5j . While the exper- 
iment of Ref. Q easily accomplished the goal of finding 
the best binding sequence starting from a pool of ran- 
dom sequences in a few generations, the shortness of the 
binding sequence [20 base pairs (bp)] makes it difficult to 
explore the interesting dynamics of the competitive evo- 
lution process. In our study we consider the evolution 
process of Ref. Q applied to much longer sequences so 
that that the steady state dynamics can be examined. An 
example of such a system might be the histone-octamer, 
which is known to bind DNA sequences of 147 bp [(J . 



We consider a pool of N DNA sequences of length L. 
Each sequence S = {b\, b 2 , br.) of nucleotides bi is sub- 
ject to independent single-nucleotide mutations at a rate 
vo< 1 per nucleotide per generation. Selection is accom- 
plished through protein-DNA binding. Let the binding 
energy of a sequence S to the protein be Eg and let the 
fraction of such sequences in the pool be rig. Assuming 
thermodynamic equilibrium for the binding process, the 
selection function is simply the binding probability, given 
by the Fermi function Q P(Eg,fj,) = 1 / [1 + exp(E g - fi)] , 
where mu is the chemical potential and all energies are 
expressed in units of k^T. Here /i serves as a (soft) se- 
lection threshold, and is determined by the fraction <f> of 
DNA sequences that remain bound to the proteins after 
selection, i.e., J2§ ^i^g, ng — <j). It can be controlled 
by either the number of available proteins or, as in the 
experiment Q, by the washing strength. The fraction <j> 
can be varied from cf> < 1 (weak competition) to <j) > 
(strong competition). We define the evolution process it- 
eratively whereby in each round, N daughter sequences 
are chosen from the existing pool according to P(Eg, //), 
and then point mutation are introduced with rate vq to 
generate the new sequence pool. 

Finally we need to specify the binding energy Eg. We 
assume that each nucleotide taking part in the bind- 
ing contributes independently, and adopt a "two-state" 
model which assigns an energy penalty e (of the order 
of a few /cbT's) for each nucleotide which does not match 
the one the protein prefers. This form of binding energy 
has been shown to work reasonably well for specific sys- 
tems |8| and has been argued to hold for a wide class of 
regulatory proteins 0, E| . Given this energy model, a 
DNA sequence with r mismatches has a binding proba- 
bility P(r, r ) = I J [l + e £(r - ro) ] , where r = fi/e is the 
selection threshold in the "mismatch space" r. [llj |. 

The above evolution model is easily implemented on 
a computer. We fix three of the model parameters at 
N = 5 x 10 5 , L = 170 and v$ — 0.01 from here on, and 
vary only the selection strength through the choice of 
the selection stringency 4>- A typical simulation result for 
strong selection (<^> = 0.1) is shown as the space-time plot 
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FIG. 1: The space-time trajectory of the mismatch distribu- 
tion n(r, i) according to the competitive evolution model with 
4> — 0.1. The inset shows the distribution n(r,t) at genera- 
tions t = 10, 40, 70, 100, after the initial transient period and 
before the distribution reaches equilibrium at r ~ 0. These 
distributions overlap upon shift by their respective threshold 
r*o(t), indicating the shape-invariance of the pulse. 

of the mismatch distribution n{r,t) =^2g ng S(Eg — re) 
in Fig. n We see that the distribution quickly forms a 
shape-preserving pulse (see the inset of Fig. which 
moves, decelerates, and eventually reaches equilibrium 
in the neighborhood of the optimal sequence (at r = 
0). Basically, the selection eliminates weak binders in 
the population to improve the average binding energy, 
hence the selection threshold tq is decreased in the next 
round, while the change of r further selects sequences 
with better binding energies. Along with new variety 
generated by mutation, a propagating pulse results. 

We next investigate the dynamical behavior of the 
above evolution model analytically using a mean-field de- 
scription. It will be convenient to describe the dynamics 
in the mismatch space r. Let us first consider the con- 
tribution from point mutation. For a sequence of length 
L, "alphabet size" A (A = 4 for nucleotides) and r mis- 
matches, there are L ■ (A — 1) ways to mutate to a new 
sequence via a single point mutation. Among them, there 
are (L — r){A — 1) ways to increase r by one, and r ways 
to reduce r by one. Hence, a standard master equation 
can be written to describe the mutational dynamics of 
the distribution n(r,t) in the mean field limit N 1 
[ill H^ . . The effect of selection/amplification process 
can be phenomcnologically modeled by an additive term 
proportional to tfi~ 1 P(r, ro) for weak selection. Further 
taking the continuum limit in r (valid in the limit of large 
L and smooth population distribution), we arrive at the 
following mean- field description for n(r,t) 

dtn = d r [d r (D(r) n) — v(r)n] + U[r; n) ■ n(r, t) (1) 
U[r;n] = [<p- 1 P(r,r (t))-i\/T, (2) 

The first two terms on the right-hand side of Q result 
from the (conservative) mutational processes, with 

y 1 2 V A-1LJ ' w V A-ILJ 

(3) 



being the "diffusion coefficient" and "drift velocity" re- 
spectively 11], v = vqL. The r-dependences of v and 
D reflect the different phase space volume for the differ- 
ent mismatches. For example, the form of v{r) ensures 
that the the distribution approaches the maximum en- 
tropy point with r = mismatches by mutation 
alone. The third term in Eq. Q represents the effect 
of the selection/amplification, controlled by the growth 
function U defined in Eq. (0). (The factor r ~ 0(1) de- 
notes generation time.) Competition is explicitly mani- 
fested in the n dependence of the growth function U, via 
the threshold ro(t) which is determined from the con- 
dition <f> — J drP(r,ro(t))n(r,t). In Eq. J5J), an over- 
all shift in U by the constant —1 has been included to 
ensure that the population size N is conserved after se- 
lection/amplification, in accordance with the evolution 
process. This shift produces the desired competitive ef- 
fect that individuals which bind better than the threshold 
ro(t) are reproduced and those not meeting the threshold 
decay away. In the actual analysis, we will approximate 
the Fermi function P(r, ro) by a step function 0(ro — r), 
which turns out not to affect the qualitative behavior. 

We will see that the simplicity of the continuum mean- 
field equation provides an analytic understanding of 
generic features of the evolutionary dynamics, including 
the existence of the decelerating, shape-preserving popu- 
lation pulse; it also provides an analytical estimate of the 
smallness of the finite- AT correction. However that quan- 
titative differences do exist between our simplified de- 
scription and the breeding schemes employed in the sim- 
ulations and experiments, due to the phenomenological 
nature of simplified description, and the continuum ap- 
proximations used in both the mismatch space and time. 
(A quantitatively more accurate approach has recently 
been developed by Kloster and Tang Q.) 

We start with the simplest case of infinite sequence 
length (while keeping v a finite constant), yielding con- 
stant coefficients D(r) = D and v — v. Making the 
Ansatz in Eq. that the distribution n(r, t) = n[y(r, t)] 
where y = r — ro(t) and ro(t) = —ct for some constant 
speed c, we obtain the ODE 

Dn"{y) - [(c + v)n{y)]' + u(y)n(y) = 0, (4) 

where u(y) = (0~ 1 0(— y) — l) /r. A physically allowed 
solution of Eq. exists for every c > Co — ^, where 

c = y/ADtf- 1 - 1)77. 

In fact, the smallest possible speed c mul = c a — v is se- 
lected by the dynamics given a reasonably compact initial 
distribution. Here, velocity selection follows the famil- 
iar marginal stability mechanism [3|: The selected solu- 
tion n*(y) (with the velocity c min ) is the one that decays 
most sharply at the pulse front (i.e., the r < ro end) 
among all the allowed solutions. Thus, as the front of the 
distribution broadens from the initial condition, it first 
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reaches the asymptotic decay of n* . From then on, the 
distribution stops broadening and moves with the speed 
Cmin- Standard arguments show that this is equivalent 
to the condition that the front be marginally stable in 
the frame of reference moving with c m in- Note that as 
c oc oc y/v, c m i n < when the mutation rate v is 
sufficiently large, indicating the worsening of the overall 
affinity of the sequences due to accumulation of delete- 
rious mutations despite the presence of selection. These 
results apply also to the more general Fermi function, as 
it is only the asymptotic behavior of the growth term 
[i.e., U(r <C tq)] that governs velocity selection. 

In evolutionary dynamic s, the p opulation size N often 
plays a very important role |12Ul3| . To see how N enters, 
we note that the mean-field equation has an incon- 
sistency in that at the very front of the moving pulse, 
arbitrarily small n gets the benefit of exponential am- 
plification. But in reality, the number of individuals is 
discrete so that n should always be greater than 1/N. To 
deal with this problem, a cutoff procedure was proposed 
within the mean-field framework [l5l Il6j | . Here we em- 
ploy this procedure to estimate the effect of a finite pop- 
ulation on the evolutionary velocity ^| . Specifically, we 
modify the selection/ amplification term u(y) in Eq. Q 
to u(y)Q(n — N^ 1 ) (for y < 0). A direct extension of the 
approach in Ref. leads to <k /c ~ ^/ In 2 N for the 
fractional change in cq, which has the same scaling form 
as that for the Fisher equation 0|. To test this result, 
we ran simulations (with a modified mutational scheme 
to achieve a constant drift) to measure the propagating 
speed of the pulse for different population sizes N. Our 
finding of Sc/c w 0.06 between N = 5 x 10 3 and 5 x 10 8 is 
in line with the expectation and indicates that under typ- 
ical experimental conditions, the fluctuation effect due to 
finite population is insignificant. 

We next examine the more realistic situation of finite 
sequence length L. The important new effect is due to 
the r-dependence in the drift velocity v(r) [see Eq. J3J], 
which, as the population approaches towards r = 0, in- 
creasingly hinders its advance. This can already be ap- 
preciated if we assume a quasi steady-state dynamics and 
replace v in the formula for c m i n with v(r) = v—^r [where 
7 = according to ©]: We find a stable stationary po- 
sition, r E Q ps (v — cq)/^ where c m i n = 0. Here we identify 
this position naturally with the mean of the population 
r EQ ee Jrn E< ^{r)dr. 

To proceed with a more rigorous analysis, we neglect 
the r dependence in D which has only a small quantita- 
tive effect. Also, we assume that the equilibrium position 
^EQ ^ go ^lat the boundary condition at r — can 
be safely ignored. Returning to the mean-field equation 
Q, we use the same moving-pulse Ansatz as before ex- 
cept that we no longer fix a linear time dependence to 
the threshold ro(i). This Ansatz produces a linear ODE 



for r (t): 

ir (t)+r (t)= ir * Q (5) 

where r E is the equilibrium threshold, so that a static 
distribution can be achieved in the moving frame. The 
population mean f follows exactly the same motion (in 
fact, f(t) ps ?"o(i) except when selection is very weak), i.e., 
a single exponential with time constant 7 (which depends 
only on the point mutation rate vq). This is a generic 
result independent of the details of the fitness function, 
as long as a pulse solution exists for Eq. (JIJ. The decay 
constant 7 obtained from simulation of the discrete model 
is in quantitative agreement with the expectation (§^0) 
for weak selection (1 > <j> > 0.25); an example is shown 
in Fig. 0a). I n fact, the same qualitative result (i.e., the 
existence of a shape-preserving pulse) holds for strong 
selection as shown already in Fig. ^ where <f> = 0.1. 

The shape of the pulse, i.e. the equilibrium distribu- 
tion n E Q, is governed by the same ODE as Eq. Q, except 
that the constant velocity c is replaced by —7(2/ + r^). 
The resulting equation again has a continuum of phys- 
ically allowed solutions, each having a different shape 
and corresponding to a different equilibrium position r E( ^ 
(hence different r E Q). Here we have an interesting gener- 
alization of velocity selection to the selection instead from 
a continuum of decelerating pulses. Again, starting from 
a compact initial distribution, the dynamics selects the 
solution ^3 whose front (y < 0) decays most rapidly, (in 
this case a Gaussian falloff), whereas the other solutions 
have a power-law front. 

The r E Q extracted from the selected solution agrees 
well with its heuristic approximation of (y — co)M when 
7 « 1; see Fig. EJb). The theory is quantitatively ac- 
curate |2J| when the selection is not too strong (e.g., 
(f)^ 1 < 2.5). For very strong selection, the equilibrium 
threshold position r E( ^ approaches r = and the bound- 
ary condition there needs to be taken into account. When 
cq and 7 are expressed in terms of original parameters, 
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FIG. 2: (a) Evolution of the mean mismatch r(t) for cj> — 0.5. 
The equilibrium distribution is used as the initial condi- 
tion to mitigate transient effects. The solid line is a single- 
exponential fit using the theoretical value of 7 = |fo = .Of 33. 
(b) Equilibrium positions as a function of selection pressure 
(fi^ 1 — 1. The line is the theoretical estimate r ECJ = (u — Cq)/j, 
using a generation time r = 2.77 (obtained by calibrating cq 
from theory with that from simulation). 
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r E Q = (y — cq)/"/ suggests that for a population with se- 
quence length L 1, the population pulse could stall at 
r E Q 3> 1. In order for the population to reach the opti- 
mal at r w 0, we need to increase the selection strength 
(i.e., lowering (f>) or decrease the mutation rate so that 
(p^ 1 > 1 + vqtL/1, to overcome the bigger entropic bar- 
rier associated with longer sequences. 

As there have been extensive studies of evolution on 
various landscapes in the context of population genet- 
ics 0, 0, 0| , it is worth comparing the dynamical be- 
havior of competitive evolution with that of more com- 
mon evolutionary models. The traditional study of evo- 
lution focuses on fixed fitness landscapes, where every 
genotype (e.g., sequence S) has a predetermined absolute 
fitness value (i.e., the reproductive rate of the sequence 
S). Competitive evolution is different in that it is sub- 
ject to a dynamic fitness landscape. That is, the fitness is 
measured relative to a dynamic selection threshold and 
progress towards the best binding sequence occurs via 
competition among the currently existing genotypes - 
being better is all-important, not being best. This aspect 
of competitive evolution leads to qualitatively different 
dynamical behavior. For comparison, we can consider the 
simplest and most widely studied fixed landscape, i.e., 
the smooth "Mt. Fuji" landscape HI E! IU E2] , where 
each nucleotide contributes independently and additively 
to the fitness of the sequence, thus forming a landscape on 
which fitness rises steadily toward a single peak. For in- 
finite sequence length, the mean-field theory fails in that 
it produces an unphysical, run-away solution [15J due to 
the unlimited growth rate of n at the high fitness states 
[lil IrR , and a finite population has a traveling speed 
that is essentially proportional to population size [l7j . 
For finite sequence length, the finite population dynam- 
ics is orders-of-magnitude slower in reaching equilibrium 
than the (now non-divergent) mean-field prediction |l7j |. 
In contrast, finite population effects merely cause a small 
correction for the competitive evolutionary process. 

To summarize, we investigated the dynamics of com- 
petitive evolution in the context of molecular evolu- 
tion experiments. The major result concerns the ex- 
istence and properties of a shape-invariant population 
pulse which propagates towards an eventual equilibrium 
configuration. Analytical results on the motion of the 
pulse obtained from the mean-field equation are in good 
agreement with simulations. Also, corrections due to fi- 
nite population size are shown to be insignificant. An 
interesting aspect of our findings is the convergence of 
the evolution process to a solution far from optimal (i.e., 
^eq jj? ^ e se i ec ti on strength is not sufficiently 

strong or mutation rate not sufficiently low. In general, 
competitive evolution is rather different from the usual 
picture of climbing a fixed fitness landscape. This ap- 
proach may be applicable more generally, e.g. to natural 
evolution in cases where competition for scarce resources 
is the primary driving force, as an organism only needs 



to be more efficient than its competitors to win the battle 
for evolutionary survival. 
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